A Randomized Exhaustive Propositionalization Approach for Molecule Classification

نویسندگان

  • Michele Samorani
  • Manuel Laguna
  • Robert Kirk DeLisle
  • Daniel C. Weaver
چکیده

Drug discovery is the process of designing compounds that have desirable properties, such as activity and non-toxicity. Molecule classification techniques are used along this process to predict the properties of the compounds in order to expedite their testing. Ideally, the classification rules found should be accurate and reveal novel chemical properties, but current molecule representation techniques lead to less than adequate accuracy and knowledge discovery. This work extends the propositionalization approach recently proposed for multi-relational data mining in two ways: it generates expressive attributes exhaustively and it uses randomization to sample a limited set of complex (“deep”) attributes. Our experimental tests show that the procedure is able to generate meaningful and interpretable attributes from molecular structural data, and that these features are effective for classification purposes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering Relational Data Based on Randomized Propositionalization

Clustering of relational data has so far received a lot less attention than classification of such data. In this paper we investigate a simple approach based on randomized propositionalization, which allows for applying standard clustering algorithms like KMeans to multi-relational data. We describe how random rules are generated and then turned into boolean-valued features. Clustering generall...

متن کامل

Flexible propositionalization of continuous attributes in relational data mining

In a relational database, data are stored in primary and secondary tables. Propositionalization can transform a relational database into a single attribute-value table, and hence becomes a useful technique for mining relational databases. However, most of the existing propositionalization approaches deal with categorical attributes, and cannot handle a threshold on an attribute and a threshold ...

متن کامل

Propositionalization of Relational Learning: An Information Extraction Case Study

This paper develops a new propositionalization approach for relational learning which allows for efficient representation and learning of relational information using propositional means. We develop a relational representation language, along with a relation generation function that produces features in this language in a data driven way; together, these allow efficient representation of the re...

متن کامل

A Link-Based Method for Propositionalization

Propositionalization, a popular technique in Inductive Logic Programming, aims at converting a relational problem into an attributevalue one. An important facet of propositionalization consists in building a set of relevant features. To this end we propose a new method, based on a synthetic representation of the database, modeling the links between connected ground atoms. Comparing it to two st...

متن کامل

Propositionalization Through Relational Association Rules Mining

In this paper we propose a novel (multi-)relational classification framework based on propositionalization. Propositionalization makes use of discovered relational association rules and permits to significantly reduce feature space through a feature reduction algorithm. The method is implemented in a Data Mining system tightly integrated with a relational database. It performs the classificatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • INFORMS Journal on Computing

دوره 23  شماره 

صفحات  -

تاریخ انتشار 2011